Statistical Analysis of Behavioral Data 



UNIT 13.8 



The statistical analysis of behavioral data follows the collection and checking of data, 
and is aimed at assessing the effect of treatments on the observed behaviors. In the 
following, the author will briefly describe different behavioral tests and the response 
variables collected in each test, to introduce the statistical methods for the analysis of 
such variables. A more detailed description of the tests and of the response variables can be 
found in specific units of Current Protocols in Toxicology. The most specialized statistical 
terms appearing in this unit are defined under Specialized Statistical Terms, below. 

When available for the analysis of a particular behavioral test, different statistical ap- 
proaches to behavioral data will be described. These are often not redundant, yielding 
different kinds of information, and resulting in the refinement of behavioral research 
(Chiarotti and Puopolo, 2000). The different approaches can be characterized by dif- 
ferent degrees of complexity which, if present, will be highlighted in each specific 
paragraph. For a synthetical description of the aim and characteristics of the different 
statistical methods that can be used for the analysis of each behavioral test data, see Table 
13.8.2 at the end of this unit. 



SPECIALIZED STATISTICAL TERMS 

Between-Subject, Within-Subject, and Repeated Measures Factors 

The between-subject factor is a factor that includes two or more levels, each of them ran- 
domly assigned to one group of independent statistical units (e.g., mice, rats, monkeys). 
Units in each final group are independent of each other and independent with respect to 
all units belonging to the other final groups. 

The within-subject factor is a factor that includes two or more levels, each of them 
assigned to one of two or more correlated groups of independent statistical units (e.g., 
mice, rats, monkeys). This means that units in each final group are independent of each 
other, while each unit in a final group is correlated to one unit in any other final group. 

The repeated measures factor is a factor that includes two or more levels, each of them 
assigned to each statistical unit. This means that each statistical unit is repeatedly tested 
under all levels of the repeated measures factor. 

Normality 

The assumption of normality implies that the distribution of the variable to be analyzed 
by parametric tests (see Statistical Tests for the Analysis of Behavioral Data, below) is 
normal in the population from which units (or blocks of units) are sampled. The normal 
distribution is a continuous frequency distribution of infinite range having a single mode 
(unimodal). It is characterized by two parameters: skewness and kurtosis. 

Skewness describes the asymmetry of a distribution. It is equal to 0 for unimodal sym- 
metrical distributions, such as the normal distribution, whereas it is negative for unimodal 
distributions with a longer left tail (towards lower values of the variable) and positive for 
distributions with a longer right tail (towards higher values). 

Kurtosis describes the steepness of the unimodal frequency curve toward the mode. 
Usually, the kurtosis index is centered so that it is equal to 0 in normal distributions. Dis- 
tributions with a positive, zero, and negative kurtosis index are referred to, respectively, 
as "leptokurtic" (heavy-tailed), "mesokurtic," and "platykurtic" (light-tailed). 
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The normality assumption also states that the treatments administered to the statistical 
units in the different subgroups do not affect the shape of the distribution of the variable 
and that only the mean value of the variable changes among subgroups. 

Homoscedasticity (Homogeneity of Variance) 

Homoscedasticity refers to the homogeneity of variance among the independent 
groups being compared. In experimental designs including between-subject factors, ho- 
moscedasticity must be evaluated by comparing variances among the groups based on 
factor levels. When more data are collected on each statistical unit (within- subject or 
repeated measures factors), the data in each block (corresponding to within-subject fac- 
tor levels) or in each unit (corresponding to repeated measures) should first be averaged 
before testing for homoscedasticity. 

Sphericity 

The sphericity assumption states that the variances of the differences between paired 
observations are equal across all groups in the sampled population. A simpler yet stricter 
condition is that referred to as "compound symmetry," which is met when, in the variance- 
covariance matrix of the sampled population, all of the variances are equal and all of 
the covariances are equal, although the covariances are not necessarily equal to the 
variances. This means that the data collected on statistical units under different conditions 
(corresponding to different levels of the within-subject factor or to different repeated 
measures) must be equally related to each other with the same correlation coefficient, 
p. Compound symmetry is sufficient, yet not necessary, for ensuring the validity of the 
F ratio under the general null hypothesis of no treatment effect. The F ratio is the test 
statistic in the parametric ANOVA. In particular, it is computed as the ratio between the 
variance of the effect under assessment and the variance of the error term. This ratio 
follows an F distribution if all the assumptions upon which the test relies are met. In 
other words, if compound symmetry is satisfied, then sphericity is also satisfied, but if 
compound symmetry is not satisfied, then sphericity must still be evaluated. 
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FOX BATTERY TEST 

The Fox battery test is a battery of tests that characterize maturation of a set of sensori- 
motor functions, behavioral traits, and learning abilities from birth to weaning, initially 
developed for the assessment of behavioral ontogeny of house mice (Fox, 1965). Three 
main problems arise in analyzing data from a Fox battery test. First, the original response 
variables are ordinal scores, representing successive developmental stages achieved by 
the experimental subjects. Specifically, for each item, four well defined stages of mouse 
development can be recognized and scored: (i) absence of response (score 0); (ii) initial 
appearance of response (score 1); (iii) evident response (score 2); and (iv) full response 
(score 3). Thus, the range of data is so narrow (scale from 0 to 3) that data distribution 
cannot be considered as continuous. Therefore, neither parametric nor nonparametric 
tests can be applied, because the former require that data be normally distributed, and 
the latter cannot manage the large number of equal values (ties) coming from the narrow 
range of scores. 

Second, for a better evaluation of the effect of treatments (in a broad sense) on the 
ontogeny of behavior, littermates receive, when possible, different treatments (obviously, 
under some conditions, this is very difficult or even impossible, as in the case of treatments 
that must be administered prenatally). Splitting litters into different treatment groups 
allows one to reduce at most the confounding effect of the genetic variability between 
litters, which would affect the assessment of treatment effect if whole litters were assigned 
to the same treatment group. Furthermore, littermates are tested repeatedly over days to 
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monitor the behavioral development. From a statistical point of view, this results in 
experimental designs including litters as the random blocking factor, treatments as the 
fixed factor within litters, and postnatal days (PNDs) as the fixed repeated measures 
on individual subjects. Therefore, it is not appropriate to use methods that assume the 
independence of observations, such as the x 2 test, frequently performed to assess the 
different distribution of the scores among treatment groups day-by-day. 

Finally, score profiles are expected to be monotonically nondecreasing, but sometimes 
they may present regressions — i.e., the score a subject achieves one day may be lower 
than the score achieved on the day before. This can be due to the particular situation in 
which the subject is at the time of testing (e.g., digestion, sleep, or even boredom), which 
can affect its capability and promptness in responding to stimuli. 

Two solutions can be adopted to analyze Fox battery test data, as described, respectively, 
in steps la to 6a and lb to 4b, below. 



Analysis of a synthetical measure of the score profile 

1 a. Convert the original scores for each subject in each component of the test battery from 
multiple (repeated observations over time for each subject) to single (one observation 
per subject), by computing a synthetical measure. 

To do this, different strategies can be adopted, as outlined below. Each of them presents 
both pros and cons. 

i. For each component of the test battery, choose a particular score (usually scores 2 
or 3) as adult-like response or full appearance of a somatic feature, and register the 
first day when the subject attains a score equal to or greater than that (for the sake 
of brevity, in the following this will be called first day of adult-like responding). 

To choose the abovementioned threshold score, the score profile in the period of testing, 
collected on subjects in the control group, must be considered. Specifically, a good choice 
for the threshold is a score that control subjects are likely to present some days after the 
beginning and before the end of the testing period. 

Pro: If the threshold is appropriately chosen, it is possible to detect anticipations or delays 
of the full response, which can be attributed to the treatment s) under study. 

Con: If the threshold score is not set at the maximum score (e.g., at score 2), it is 
impossible to distinguish between subjects attaining the full response with different scores 
(e.g., with score 2 or score 3) on the same day of testing. Moreover, subjects attaining 
the full response on the same day of testing with the same score, but presenting different 
score profiles before and/or after the attainment of the full response, will be considered 
equivalent. 

ii. For each component of the test battery, compute the area under the curve generated 
by the repeated scores. 

Pro: The whole score profile is considered in computing this synthetical measure. Subjects 
attaining the full response at the same day of test with the same score, but presenting 
different score profiles before and/or after the attainment of the full response, will be 
given different synthetical measures. 

Con: Different score profiles, even in the presence of different first days of adult-like 
responding, can result in the same synthetical measure. 

2a. Check the normality of the distribution of transformed synthetical data in each 
subgroup of subjects, using the Shapiro-Wilk test or the Kolmogorov-Smirnov test, 
Lilliefors modification (Armitage et al, 2002; Chiarotti, 2004). If the normality 
assumption is respected, proceed to step 3a; otherwise go to step 5a. 
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3a. Compare variances among subgroups using the Levene test or the Bartlett test 
(Armitage et al, 2002; Chiarotti, 2004). If variances are homogeneous, proceed 
to step 4a; otherwise go to step 5a. 

4a. Perform, on the transformed synthetical data, the parametric tests for the comparison 
among groups appropriate for the modified experimental design (Edwards, 1985; 
Wilcox, 1987; Winer et al, 1991). Go to step 6a. 

Remember that if the design includes repeated measures taken on the subjects, appropriate 
correction in case of violation of the sphericity assumption must be adopted, such as 
Greenhouse-Geisser or Huynh-Feldt corrections (Winer et al., 1991; Armitage et al., 
2002; Chiarotti, 2004). 

5a. Perform, on the transformed synthetical data, the nonparametric tests for the com- 
parison among groups appropriate for the modified experimental design (Marascuilo 
and McSweeney, 1977; Brunner et al, 2002). Go to step 6a. 

Note that the most commonly used statistical packages do not include nonparametric 
methods for the analysis of experimental designs with between-subject and within-subject 
or repeated measures factors. Therefore, to perform the suggested analysis, one must 
have access to an ad hoc software application and to the knowhow necessary for its use. 

6a. Present the results (see Presentation of Results, below). 

Analysis of the whole score profile by survival curves 

lb. Choose the score to be considered as response, i.e., the score after which the subject 
that attained it will be ignored in the analysis; that score will be considered as death. 
Register, for each subject, the time from the first day of observation until the first 
day when the subject attains a score equal to or greater than the response; that time 
will be considered as time to death. 

Subjects that do not reach the abovementioned score will be considered as censored , and 
will be given a time equal to the span of observation (follow-up time). 

2b. Analyze survival data using methods such as the Kaplan-Meier survival analysis or 
the Cox proportional hazards model (Armitage et al., 2002). 

The Kaplan-Meier survival analysis method allows one to compare survival times among 
two or more independent groups and to take into account at most one stratification vari- 
able. The Cox proportional hazards model method allows one to assess the independent 
effect on survival of more fixed or time-dependent covariates. 

3b. Repeat the survival analysis using different scores as response. 

It is important to note that repeated analyses on the same data can result in an increase 
in the probability of type I error, and that appropriate corrections (such as the Bonferroni 
correction; Armitage et al, 2002) should be adopted. 

Pro: The described methods allow one to take into account the whole score profile. 

Con: The described methods have been developed for independent data, and correlated 
data (such as those resulting from littermates assigned to different treatment groups) 
cannot be managed easily. 

More complex methods can deal with survival data organized in different stages, taking 
into account the successive survival times in each consecutive stage. Unfortunately, 
these are more difficult to manage, and many of the most common statistical software 
applications do not perform such methods. Therefore, unless one has access to the more 
specialized software applications and to the knowhow necessary for their use, it may be 
advisable to refer to the kind of analysis suggested above in steps lb to 3b. 

4b. Present the results (see Presentation of Results, below). 
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ULTRASOUND VOCALIZATIONS TEST 

Ultrasound vocalizations are emitted by pups in response to various conditions, such as 
maternal separation, low-temperature isolation, tactile stimulation, or olfactory stimula- 
tion. These vocalizations stimulate prompt expression of maternal behavior. 

The response variable in the ultrasound vocalizations test (Hofer et al., 2001) is the 
number of calls emitted by the pups under various experimental conditions (between- or 
within-subject factor, according to the experimental design adopted). 

For a more sophisticated analysis, vocalizations can be characterized according to the fre- 
quency modulation and the duration of the acoustic signal (sound category), as evidenced 
by the spectrographic analysis of calls. Moreover, ultrasound can be classified accord- 
ing to different frequency bands. Each vocalization is then defined by a specific sound 
category and frequency band. Therefore, the number of calls emitted by the pup in each 
sound category/frequency band combination can be considered as repeated measures on 
the pup. 

The statistical analysis of the response variable number of calls passes through the 
following steps. 

1. Check the normality of the distribution of data in each subgroup of subjects us- 
ing the Shapiro-Wilk test or the Kolmogorov-Smirnov test, Lilliefors modification 
(Armitage et al., 2002; Chiarotti, 2004). If the normality assumption is respected, 
proceed to step 2; otherwise go to step 4. 

2. Compare variances among subgroups using the Levene test or the Bartlett test 
(Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, proceed 
to step 3; otherwise go to step 4. 

3. Perform the parametric tests for the comparison among groups appropriate for the 
experimental design adopted (Edwards, 1985; Wilcox, 1987; Winer et al., 1991). Go 
to step 7. 

Remember that if the design includes repeated measures taken on the subjects, appropriate 
correction in case of violation of the sphericity assumption must be adopted, such as 
Greenhouse-Geisser or Huynh-Feldt corrections (Winer et al., 1991; Armitage et al., 
2002; Chiarotti, 2004). 

4. Consider the experimental design. If it is rather simple — i.e., no limits for the 
between-subject factors (treatments) and at most two levels for each repeated mea- 
sures factor (days, trials) — go to step 5; otherwise go to step 6. 

5. Analyze the profile on the consecutive days of testing and trials using a nonparametric 
test for the comparison among groups, choosing the test appropriate for the experi- 
mental design adopted — including between-subject factor(s) and repeated measures 
factor (days, trials) (Marascuilo and McSweeney, 1977; Brunner et al., 2002). Go to 
step 7. 

Remember that the most commonly used statistical packages do not include nonparametric 
methods for the analysis of experimental designs with between-subject and within-subject 
or repeated measures factors. Therefore, to perform the suggested analysis, one must 
have access to an ad hoc software application and to the knowhow necessary for its use. 

6. Transform the original data for number of calls using the appropriate normalizing 
transformation, which must be applied separately on each observation (Edwards, 
1985; Armitage et al, 2002). 
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Remember that the appropriate transformation must be chosen on the basis of some 
knowledge with regard to the theoretical and empirical distributions of data (i.e., the 
distributions of data in the population and in the observed samples, respectively). Note 
that non-normality of the UV response variable usually results from too small a number 
of calls under final conditions, which suggests a Poisson distribution of data. Thus, the 
square-root transformation is the most appropriate for normalizing data distribution. After 
the transformation, go back to step 1. Note that persistent violations of the assumptions 
of normality and homogeneity of variance after the transformation of data indicate that 
the performed transformation was not appropriate. 

7. Present the results (see Presentation of Results, below). 



BASIC PASSIVE AVOIDANCE TEST 

I ROl OC OL ... Usually, in the passive avoidance test, the experimental subjects undergo repeated trials 
until they reach the criterion (which is, commonly, suppressing the spontaneous response 
for two consecutive trials), or, in case they do not reach the criterion, until they undergo 
the predetermined maximum number of trials. The first phase of learning (acquisition) is 
usually followed by an interval, and then subjects undergo one single retest trial to assess 
the long-term memory (memory retention). 

Different kinds of data can be collected in a passive avoidance test, namely (1) number 
of trials to reach the criterion, (2) latency profile in the repeated trials of the learning 
phase, and (3) latency to step-through in the retest trial. 

All the response variables are characterized by the presence of a cutoff that limits the 
range of values the response variable can take. In particular, latencies longer than the 
cutoff value are commonly arbitrarily given the cutoff value. This value is also used in 
the learning phase to complete the latency profile for the trials following the attainment 
of the criterion (usually, these trials are not performed). For all this, data usually do not 
follow a normal distribution. Thus, parametric methods, which use statistics based on the 
quantitative value of the different observations, are inappropriate. 

Different solutions can be adopted for the analysis of the various response variables. 
Number of trials to reach the criterion 

This variable can be subjected to two different analyses as described in steps la to 6a 
and lb to 4b, respectively. 

Tests for the comparison among groups (most frequently used) 

la. Check for the presence of cutoff values. If they are present, go to step 5a; otherwise 
go to step 2a. 

2a. Check the normality of the distribution of data using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al, 2002; Chiarotti, 
2004). If data are normally distributed, go to step 3a; otherwise go to step 5a. 

3a. Compare variances among subgroups of subjects using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 4a; otherwise go to step 5a. 

4a. Perform a parametric test for the comparison among groups, choosing the test ap- 
propriate for the experimental design adopted (Edwards, 1985; Wilcox, 1987; Winer 
et al, 1991). Go to step 6a. 
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5a. Perform nonparametric tests for the comparison among groups, choosing the test 
appropriate for the experimental design adopted (Marascuilo and McSweeney, 1977; 
Brunner et al., 2002). Go to step 6a. 

6a. Present the results (see Presentation of Results, below). 

Survival analysis (less commonly used) 

lb. Set the status of the subject equal to dead if the subject has reached the criterion 
within the maximum number of trials, or censored, if not. 

2b. Set the survival time equal to the number of trials to reach the criterion (if the subject 
is dead) or to the maximum number of trials (if the subject is censored). 

3b. Perform the Kaplan-Meier survival analysis (Armitage et al, 2002) on these data, 
using treatment(s) as grouping variable(s). 

4b. Present the results (see Presentation of Results, below). 

Latency profile in the repeated trials 

If the experimental subject reaches the criterion during the acquisition phase before 
undergoing the maximum number of trials, it is usually given the maximum latency for 
the remaining trials. The latency profile is then strongly affected by the limits arbitrarily 
set by the researcher. For this reason, parametric methods are frequently not appropriate 
for the analysis of this kind of data. On the other hand, nonparametric methods taking 
into account between-subject and repeated measures factors at the same time are not 
available in the most commonly used statistical packages. 

lc. Convert the whole latency profile into a single response variable, using one of the 
two following methods. 

i. Calculating the median latency value. 

ii. Calculating the area under the latency curve. 

2c. Analyze the single response variable using tests for the comparison among groups. 
To do this, follow the instructions given for the variable number of trials to reach the 
criterion (see steps la to 5a). 

3c. Present the results (see Presentation of Results, below). 

Latency to step-through in the retest trial 

Subjects that have learned the task in the acquisition phase usually show memory of 
the task in the retest trial, unless the administered treatments interfere with memory 
processes. For this reason, latency in the retest trial is often affected by the presence 
of cutoff values which, together with the non-normality of the distribution common to 
most latency variables, make parametric tests not suitable for the statistical analysis. 
Therefore, nonparametric tests for the comparison among groups are usually the best 
choice, as described in steps Id to 6d. 

Id. Check for the presence of cutoff values. If they are present, go to step 5d; otherwise 
go to step 2d. 

2d. Check the normality of the distribution of data, using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al, 2002; Chiarotti, 
2004). If data are normally distributed, go to step 3d; otherwise go to step 5d. 

3d. Compare variances among subgroups of subjects, using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 4d, otherwise go to step 5d. 
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4d. Perform a parametric test for the comparison among groups, choosing the test ap- 
propriate for the experimental design adopted (Edwards, 1985; Wilcox, 1987; Winer 
etal, 1991). Go to step 6d. 

5d. Perform nonparametric tests for the comparison among groups, choosing the test 
appropriate for the experimental design adopted (Marascuilo and McSweeney, 1977; 
Brunner et al., 2002). Go to step 6d. 

6d. Present the results (see Presentation of Results, below). 
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MORRIS WATER MAZE TEST 

Usually, the Morris water maze test consists of an acquisition phase (learning) followed 
by a probe trial (retention). The acquisition phase lasts for a larger number of days, and 
on each day of testing, the experimental subjects undergo repeated trials, spaced by an 
intertrial interval. The probe trial is commonly conducted after a period (minutes) from 
the completion of the last training trial, on the last day of the acquisition phase. For more 
details, see unit 1 1.3. 

The duration of the acquisition phase (number of days, number of trials on each day, 
maximum duration of each trial, and intertrial interval), the duration of the period be- 
fore the probe trial, and the duration of the probe trial itself are predetermined by the 
experimenter. These constitute the cutoffs for the different response variables that can be 
collected in the test. 

The response variables that are collected in the acquisition phase are: (1) latency profile 
to reach the platform; (2) mean velocity, in the repeated days and trials within days; (3) 
path length, in the repeated days and trials within days; (4) time spent in the peripheral 
annular area close to the wall (thigmotaxis), in the repeated days and trials within days. 

The response variables that are collected in the probe trial are: (1) crossings of acquisition 
quadrant; and (2) total time spent in the acquisition quadrant. 

Latency profile to reach the platform 

Usually, if the experimental subject does not find the platform within the allotted time, the 
experimenter manually places it onto the platform and lets it stay there for a predetermined 
period (seconds). The latency profile is then strongly affected by the limits arbitrarily set 
by the researcher. For this reason, as with passive avoidance data, the latency profile to 
reach the platform cannot be appropriately analyzed using parametric methods. On the 
other hand, nonparametric methods that take into account, at the same time, between- 
subject and repeated measures factors (days, trials) with more than two levels each are 
not available in the most common statistical packages. Therefore, proceed as follows. 

la. Convert the latency profile in each day of testing into a single response variable, 
using one of the two following methods. 

i. Calculating the median latency value. 

ii. Calculating the area under the latency curve. 

2a. Check the normality of the distribution of data using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al, 2002; Chiarotti, 
2004). If data are normally distributed, go to step 3a; otherwise go to step 5a. 

3a. Compare variances among subgroups of subjects using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 4a; otherwise go to step 5a. 



13.8.8 

Supplement 25 



Current Protocols in Toxicology 



4a. Analyze the profile on the consecutive days of testing using a parametric test for the 
comparison among groups, choosing the test appropriate for the experimental design 
adopted, including between-subject factor(s) and the repeated measures factor (days) 
(Edwards, 1985; Wilcox, 1987; Winer et al, 1991). Go to step 6a. 

5a. Analyze the profile in the consecutive days of testing using a nonparametric test for 
the comparison among groups, choosing the test appropriate for the experimental 
design adopted (Marascuilo and McSweeney, 1977; Brunner et al., 2002). Go to 
step 6a. 

Remember that the most commonly used statistical packages do not include nonparametric 
methods for the analysis of experimental designs with between-subject and within-subject 
or repeated measures factors. Therefore, to perform the suggested analysis, one must 
have access to an ad hoc software application and to the knowhow necessary for its use. 

6a. Present the results (see Presentation of Results, below). 



Mean velocity and path length in the repeated days and trials within days 

These quantitative variables are unbounded, in that no arbitrary cutoffs are set by the 

experimenter. To analyze these variables, proceed as follows. 

lb. Check the normality of the distribution of data using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al, 2002; Chiarotti, 
2004). If data are normally distributed, go to step 2b, otherwise go to step 4b. 

2b. Compare variances among subgroups of subjects using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 3b, otherwise go to step 4b. 

3b. Analyze the profile on the consecutive days of testing and trials using a parametric test 
for the comparison among groups, choosing the test appropriate for the experimental 
design adopted, including between- subject factor(s) and repeated measures factor 
(days and trials) (Edwards, 1985; Wilcox, 1987; Winer et al, 1991). Go to step 7b. 

4b. Consider the experimental design. If it is rather simple — i.e., no limits for the 
between-subject factors (treatments), two levels at most for each repeated measures 
factor (days and trials) — go to step 5b; otherwise go to step 6b. 

5b. Analyze the profile on the consecutive days of testing and trials using a nonpara- 
metric test for the comparison among groups, choosing the test appropriate for the 
experimental design adopted, including between-subject factor(s) and repeated mea- 
sures factor (days and trials) (Marascuilo and McSweeney, 1977; Brunner et al, 
2002). Go to step 7b. 

Remember that the most commonly used statistical packages do not include nonparametric 
methods for the analysis of experimental designs with between-subject and within-subject 
or repeated measures factors. Therefore, to perform the suggested analysis, one must 
have access to an ad hoc software application and to the knowhow necessary for its use. 

6b. Transform data using the appropriate normalizing transformation, which must be 
applied separately on each observation (Edwards, 1985; Armitage et al., 2002). 
After the transformation, go back to step lb. 

Remember that the appropriate transformation must be chosen on the basis of some 
knowledge with regard to the theoretical and empirical distributions of data (i.e., the 
distributions of data in the population and in the observed samples, respectively). Usu- 
ally, square-root transformation is appropriate if data means and standard deviations in 
each treatment group are proportional, while logarithmic transformation is appropriate 



Current Protocols in Toxicology 



13.8.9 

Supplement 25 



in the case of strongly asymmetrical distribution, with a long upper tail. Note that per- 
sistent violation of the assumptions of normality and homogeneity of variance after the 
transformation of data indicate that the transformation performed was not appropriate. 

7b. Present the results (see Presentation of Results, below). 

Time spent in the peripheral annular area close to the wall (thigmotaxis) in the 
repeated days and trials within days 

This quantitative variable rarely reaches the cutoff value, except when the experimental 
subject remains in the peripheral annular area close to the wall for the whole duration of 
the trial, predetermined by the experimenter. Therefore, follow the instructions given for 
the variables mean velocity and path length (see steps lb to 7b). 

Crossings of acquisition quadrant 

This quantitative variable is superiorly unbounded. Indeed, the higher limit of this vari- 
able derives from both the predetermined duration of the probe trial (equal for all subjects) 
and the maximum swimming velocity that the experimental subjects can reach (differ- 
ent among subjects), and thus it varies from subject to subject. Therefore, follow the 
instructions given for the variables mean velocity and path length (steps lb to 7b). 

Total time spent in the acquisition quadrant 

This quantitative variable rarely reaches the cutoff value, except when the experimental 
subject remains in the acquisition quadrant for the whole duration of the probe trial, 
predetermined by the experimenter. Therefore, follow the instructions given for variables 
mean velocity and path length (steps lb to 7b). 
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SPATIAL OPEN FIELD TEST 

The spatial open field test usually consists in seven repeated sessions, during which 
animals are put in an open-field arena. In session 1 the arena is empty; in sessions 2 to 
4, four different plastic objects are placed in the arena; in sessions 5 and 6 two of these 
objects are displaced; in session 7, one of the objects is replaced by an unfamiliar one. 

Response variables in the spatial open field test can be assigned to two main categories: 
general behaviors, such as rearing, wall rearing, grooming, top rearing, and locomotor 
activity (crossings), which are recorded during all sessions; and behaviors specifically 
directed towards the objects, such as object contacts, which are recorded only in the 
sessions when objects are presented to subjects. 

For both categories, data recorded are frequencies and, in some cases, durations of 
behaviors — i.e., data are quantitative. Therefore, to analyze these variables, proceed as 
follows. 

Univariate methods 

la. Check the normality of the distribution of data using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al., 2002; Chiarotti, 
2004). If data are normally distributed, go to step 2a; otherwise go to step 4a. 

2a. Compare variances among subgroups of subjects using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 3 a; otherwise go to step 4a. 

3a. Analyze the profile on the consecutive days of testing and trials using a parametric test 
for comparison among groups, choosing the test appropriate for the experimental 
design adopted, including between- subject factor(s) and repeated measures factor 
(days and trials) (Edwards, 1985; Wilcox, 1987; Winer et al, 1991). Go to step 7a. 
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4a. Consider the experimental design: if it is rather simple — no limits for the between- 
subject factors (treatments), two levels at most for each repeated measures factor 
(days and trials) — go to step 5a; otherwise go to step 6a. 

5a. Analyze the profile on the consecutive days of testing and trials using a nonpara- 
metric test for comparison among groups, choosing the test appropriate for the 
experimental design adopted, including between-subject factor(s) and repeated mea- 
sures factor (sessions and, in case, blocks of minutes) (Marascuilo and McSweeney, 
1977; Brunner et al., 2002). Go to step 7a. 

Remember that the most commonly used statistical packages do not include nonparametric 
methods for the analysis of experimental designs with between-subject and within-subject 
or repeated measures factors. Therefore, to perform the suggested analysis, one must 
have access to an ad hoc software application and the knowhow necessary for its use. 

6a. Transform data using the appropriate normalizing transformation, which must be 
applied separately on each observation (Edwards, 1985; Armitage et al., 2002). 
After the transformation, go back to step la. 

Remember that the appropriate transformation must be chosen on the basis of some 
knowledge with regard to the theoretical and empirical distributions of data (i.e., the 
distributions of data in the population and in the observed samples, respectively). Usu- 
ally, square-root transformation is appropriate if data means and standard deviations 
in each treatment group are proportional, while logarithmic transformation is appropri- 
ate in the case of strongly asymmetrical distribution, with a long upper tail. Note that 
the persistent violation of the assumptions of normality and homogeneity of variance 
after the transformation of data, indicate that the performed transformation was not 
appropriate. 

7a. Present the results (see Presentation of Results, below). 
Multivariate methods 

lb. When many behavioral categories are collected at the same time in order to take into 
account the correlation among them, perform a principal component analysis (PCA) 
on all quantitative observations (Jackson, 1991; Armitage et al., 2002). 

The statistical unit for PCA is the subject at a session. For example, if there are two levels 
of treatment (vehicle and drug), with 10 subjects per treatment level (for a total of 20 
subjects), and 7 sessions, the resulting number of statistical units for PCA will be 2 x 10 
x 7 — 140. On each statistical unit, all different behaviors are recorded, making up the 
data set to be analyzed by PCA. 

2b. Extract PCA factors and consider only the subset of factors having an eigen- 
value (corresponding to the variance of the original data explained by that factor) 
>1- 

3b. Check the cumulative proportion of variance in data space explained by that subset 
of factors. 

Note that PCA is effective in parsimoniously describing a set of continuous, interrelated 
variables when the number of subset factors is much lower ( i.e., <50% ) than the number of 
the original variables and the cumulative proportion of explained variance is sufficiently 
large (i.e., >0.67 = 67%). 

4b. To interpret the subset factors, look at the correlations between them and the original 
variables, given by the unrotated factor loadings. 

As a rule of thumb, a correlation between the original variable x and the factor z >0.5 ( or 
<—0.5) indicates that the factor z is affected by the original variable x. Lower correlations 
denote that the variable is less important in influencing the factor. 

5b. Collect the estimated factor scores for each statistical unit on each subset factor. 
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6b. Analyze the factor scores, separately for each subset factor, with a parametric test 
for the comparison among groups, choosing the test appropriate for the experimental 
design adopted. 

In this example, the design includes treatment as between-subject factor, subjects as 
random blocking factor nested under treatment, and sessions as repeated measures factors 
(Edwards, 1985; Wilcox, 1987; Winer et al, 1991 ). 

7b. Present the results (see Presentation of Results, below). 



BASIC 
PROTOCOL 6 



MATERNAL BEHAVIOR ASSESSMENT 

The maternal behavior assessment usually consists in the recording of dams' behavior in 
the presence of their offspring. For more details, see unit 13.9. 

The response variables can be assigned to two main categories: (1) maternal behavior 
observed in the home cage, such as retrieval, nursing (arched-back, blanket, passive), 
pup care (licking, anogenital licking, stepping), dam self-care (self-grooming, eating, 
drinking), and other (digging, rearing, moving, resting, standing out of nest); and (2) 
maternal behavior in a novel cage, including retrieving and standing on nest, nursing, 
licking, digging, and self-grooming. 

Methods for the analysis of maternal behavior data depend on type of data. Steps are 
presented for dichotomous variables which are scored as either 0 or 1, as well as for 
quantitative variables (also see unit 13.9). 

For dichotomous variables 

Original data are recorded using the instantaneous sampling; thus, they are dichotomous 
scores (0/1). In particular, score 0 is assigned when the behavior is not shown in the inter- 
val (seconds) of observation, while score 1 is assigned when the behavior is performed. 
Instantaneous sampling is repeated more times (at least eight times, preferably) at each 
time point; time points (hours) are repeated (usually five times) for all days of testing 
(days). Such data can be (1) maintained as dichotomous scores, or (2) transformed into 
a quantitative variable. This choice influences the methods that can be adopted for the 
statistical analysis. 

Maintain dichotomous scores 

la. Convert the original dichotomous scores (0/1), relative to each instantaneous sample, 
into an overall dichotomous score (0/1 , absence/presence of the behavior) relative to 
each time point (hour) of observation. 

2a. Analyze the variable using a stepwise forward logistic regression (LR) analysis 
(Hosmer and Lemeshow, 2000). 

In particular, the dichotomous variable (Oil ) is the dependent variable in the LR model, 
while treatments (categorical), hours, and days of observation (continuous) are the puta- 
tive explanatory variables. 

Observations repeated on the same animals over days and hours are related to ( dependent 
on ) each other; this dependency must be taken into account by specifying groups ( clusters ) 
of dependent observations on animals. 

3a. Present the results (see Presentation of Results, below). 
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Transform dichotomous scores into quantitative variables 

lb. Transform the original dichotomous scores (0/1), relative to each instantaneous 
sample, into a quantitative variable, adding the original scores over the instantaneous 
sampes within each time point. 

These observations (one per time point) will range from 0 (behavior never shown) to the 
maximum number of instantaneous samples within the time point (e.g., 8). 

2b. Analyze quantitative data using univariate (steps 3c to 9c) and/or multivariate meth- 
ods (steps 3d to 9d). 

Univariate methods 

3c. Check the normality of the distribution of data using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al., 2002; Chiarotti, 
2004). If data are normally distributed, go to step 4c; otherwise go to step 6c. 

4c. Compare variances among subgroups of subjects using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 5c; otherwise go to step 6c. 

5c. Analyze the profile on the consecutive days and hours of testing using a paramet- 
ric test for the comparison among groups, choosing the test appropriate for the 
experimental design adopted, including between-subject factor(s) and repeated mea- 
sures factor (days, hours) (Edwards, 1985; Wilcox, 1987; Winer et al., 1991). Go to 
step 9c. 

6c. Consider the experimental design. If it is rather simple — i.e., no limits for the 
between-subject factors (treatments), two levels at most for each repeated measures 
factor (days, hours) — go to step 7c; otherwise go to step 8c. 

7c. Analyze the profile on the consecutive days of testing and trials using a nonpara- 
metric test for the comparison among groups, choosing the test appropriate for the 
experimental design adopted, including between-subject factor(s) and repeated mea- 
sures factor (days, hours) (Marascuilo and McSweeney, 1977; Brunner et al., 2002). 
Go to step 9c. 

Remember that the most commonly used statistical packages do not include nonparametric 
methods for the analysis of experimental designs with between-subject and within-subject 
or repeated measures factors. Therefore, to perform the suggested analysis, one must 
have access to an ad hoc software application and to the knowhow necessary for its use. 

8c. Transform data using the appropriate normalizing transformation, which must be 
applied separately on each observation (Edwards, 1985; Armitage et al., 2002). 
After the transformation, go back to step 3c. 

Remember that the appropriate transformation must be chosen on the basis of some 
knowledge with regard to the theoretical and empirical distributions of data (i.e., the 
distributions of data in the population and in the observed samples, respectively). Usually, 
square-root transformation is appropriate if data means and standard deviations in each 
treatment group are proportional, while logarithmic transformation is appropriate in 
the case of strongly asymmetrical distribution, with a long upper tail. Note that the 
persistent violations of the assumptions of normality and homogeneity of variance after the 
transformation of data indicate that the performed transformation was not appropriate. 

9c. Present the results (see Presentation of Results, below). 
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3d. Perform a principal component analysis (PCA) on all quantitative observations 
(Jackson, 1991; Armitage et al., 2002). 

The statistical unit for PCA is the subject at a given day and time point (hour) of 
observation. For example, if there are two levels of treatment (vehicle and drug), 10 
subjects per treatment level (for a total of 20 subjects), 13 days of observation, and 5 
time points per day (65 repeated observations on each subject), the resulting number of 
statistical units for PCA will be 2 x 10 x 13 x 5 — 1300. On each statistical unit, all 
different behaviors are recorded, making up the data set to be analyzed by PCA. 

4d. Extract PCA factors and consider only the subset of factors having an eigenvalue 
(corresponding to the variance of the original data explained by that factor) > 1 . 

5d. Check the cumulative proportion of variance in data space explained by that subset 
of factors. 

Note that PCA is effective in parsimoniously describing a set of continuous, interrelated 
variables when the number of subset factors is much lower ( i.e., <50% ) than the number of 
the original variables, and the cumulative proportion of explained variance is sufficiently 
large (i.e., >0.67 = 67%). 

6d. To interpret the subset factors, look at the correlations between them and the original 
variables, given by the unrotated factor loadings. 

As a rule of thumb, a correlation between the original variable x and the factor z >0.5 (or 
<—0.5) indicates that the factor z is affected by the original variable x. Lower correlations 
denote that the variable is less important in influencing the factor. 

7d. Collect the estimated factor scores for each statistical unit on each subset factor. 

8d. Analyze the factor scores, separately for each subset factor, with a parametric test 
for the comparison among groups, choosing the test appropriate for the experimental 
design adopted. 

In this example, the design includes treatment as between-subject factor, subjects as 
random blocking factor nested under treatment, and days and hours as repeated measures 
factors (Edwards, 1985; Wilcox, 1987; Winer et al., 1991 ). 

9d. Present the results (see Presentation of Results, below). 
For quantitative variables 

As regards retrieving and standing on nest (unit 13.9), the observed response variable 
is the latency to perform the behavior. As concerns nursing, licking, digging, and self- 
grooming, the observed response variables are the number of 30-sec intervals when the 
behaviors are performed during the course of the whole test (which commonly lasts 30 
min). For this reason, variables are quantitative, ranging from 0 to the maximum number 
of intervals (e.g., 60). For all variables, the experimental design does not include repeated 
measures factors, i.e., it is usually simple. Therefore, to analyze these variables proceed 
as follows. 

Tests for the comparison among groups (most frequently used) 

le. Check for the presence of observations equal to the cutoff value. If they are absent, 
go to step 2e, otherwise go to step 5e. 

2e. Check the normality of the distribution of data using the Shapiro-Wilk test or the 
Kolmogorov-Smirnov test, Lilliefors modification (Armitage et al, 2002; Chiarotti, 
2004). If data are normally distributed, go to step 3e; otherwise go to step 5e. 
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3e. Compare variances among subgroups of subjects using the Levene or the Bartlett 
test (Armitage et al., 2002; Chiarotti, 2004). If variances are homogeneous, go to 
step 4e; otherwise go to step 5e. 

4e. Perform a parametric test for the comparison among groups (Edwards, 1985; Wilcox, 
1987; Winer et al, 1991). Go to step 6e. 

5e. Perform a nonparametric test for the comparison among groups (Marascuilo and 
McSweeney, 1977; Brunner et al, 2002). Go to step 6e. 

6e. Present the results (see Presentation of Results, below). 
Survival analysis 

In addition to the analysis described in steps le to 6e, retreiving latency can be subjected 
to a survival analysis (Kaplan-Meier), which allows one to take into account the competi- 
tion between mutually exclusive responses. Indeed, when the mother is slow in retreiving 
the pups, they can spontaneously huddle, thus creating a nest area before the mother 
can do it herself. In this case, the mother's response must be considered as censored 
(censored 1) at the time of pups' huddling, since she is no longer able to perform the 
selected behavior. Finally, the mother must also be considered as censored (censored 2) 
if neither retrieving nor huddling is observed before the cutoff time. 

If. Set the status of the subject equal to dead if the subject has performed the behavior, 
or censored if not (whatever the cause). 

2f. Set the survival time equal to the latency to perform the behavior, if the subject is 
dead. If the subject is censored 1, set the survival time equal to the time of pups' 
huddling. If the subject is censored 2, set the survival time equal to the cutoff time. 

3f. Perform the Kaplan-Meier survival analysis (Armitage et al., 2002) on these data 
using treatment s) as grouping variable(s). 

4f. Present the results (see Presentation of Results, below). 

STATISTICAL TESTS FOR THE ANALYSIS OF BEHAVIORAL DATA 

Different tests can be used to analyze behavioral data, depending on the experimental 
design adopted and the response variable. Tests have already been mentioned that must be 
used to check the assumption upon which the parametric tests rely (Shapiro-Wilk test or 
Kolmogorov-Smirnov test, Lilliefors modification for the normality assumption; Levene 
test or Bartlett test for the homogeneity of variance; Greenhouse-Geisser or Huynh-Feldt 
corrections for the sphericity assumption; Bonferroni correction for multiple compar- 
isons). In addition, mention has been made of the Kaplan-Meier survival analysis, the 
Cox proportional hazards model, the principal component analysis (Armitage et al., 
2002), and logistic regression (Hosmer and Lemeshow, 2000), which, under some condi- 
tions, can be used to test behavioral data. On the contrary, no specific mention has been 
made of the tests for comparison among groups regarding the measures of location. For 
this reason, a very brief description of such tests, either parametric or nonparametric, 
will be given below. 

Details on all mentioned tests will not given here, because that would go beyond the aim of 
this unit. However, a deeper insight into statistics for the health and social sciences can be 
obtained from statistical textbooks (Marascuilo and McSweeney, 1977; Edwards, 1985; 
Winer et al., 1991; Armitage et al, 2002; Brunner et al., 2002). Finally, remember that 
statistical software applications are now available for all personal computers; therefore, 
what is important is to be able to choose the appropriate method and to appropriately 
describe the experimental design, rather than to know all the calculations involved in 
performing the tests. 
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Tests for the Comparison Among Groups 

Parametric tests 

This category of tests requires that some assumptions be respected in order to make 
test results reliable. Two of them are common to all parametric tests, specifically: (1) 
independence of observations within each final subgroup; (2) normality of the distribution 
of data in each subgroup. Other assumptions depend on the specific test being performed. 

Student t test for two independent groups 

This is the test most frequently used to compare two groups of independent subjects in 
cases where a single observation is available for each subject (i.e., no repeated measures). 
Subjects are either randomly assigned to one of two groups according to the treatment 
administered by the experimenter (e.g., vehicle and drug), or randomly selected by the 
experimenter from the population of subjects presenting one of the two modalities of the 
characteristic under investigation (e.g., males and females). 

This test is based on the following additional assumptions: (3) independence of the 
observations between the two groups, and (4) homogeneity of variance between the two 
groups. It is important to note that the t test is robust against slight violations of the 
assumptions of normality and homogeneity of variance. Variances in the two groups can 
be compared, using, for example, the Bartlett, Fisher F, Cochran, or Levene test. If the 
variances are significantly different, tests for separate variances must be applied (e.g., 
Welch t test, Brown-Forsythe test), which are usually more conservative than the Student 
t test. 

Student t test for two dependent groups (Paired t test) 

This is the test most frequently used to compare two groups of correlated statistical units. 
In particular, n pairs of correlated observations are collected, coming either from n pairs 
of related subjects (e.g., littermates, one tested after the administration of vehicle and 
the other after the administration of drug), or from n subjects, each tested under both 
conditions (vehicle and drug), with the order of treatment administration being randomly 
assigned to each subject. No additional assumptions are required for this test. The paired 
t test is robust against slight violations of the normality assumption. 

Analysis of variance (ANOVA) 

ANOVA is a very useful method that allows one to assess the significance of the difference 
among two or more groups of subjects (i.e., the significance of the effect of one treatment 
factor with two or more levels). Moreover, factorial ANOVA allows one to assess the 
significance of the effect of more treatment factors, each with two or more levels, together 
with their interaction(s). For example, if male and female mice are treated with vehicle 
or drug, the experimental design includes four subgroups coming from the combination 
of sex and treatment, i.e., control males, drug males, control females, drug females. In 
this situation, it is possible to test the main effect of drug, the main effect of sex, and the 
interaction of drug and sex. 

When performing the ANOVA, particular care must be put into model selection. The 
ANOVA model depends on the experimental design. There are three main types of 
experimental designs: (1) completely randomized designs, (2) randomized block designs, 
and (3) split-plot designs. 

Completely randomized designs (CRD). In this situation, subjects belonging to the differ- 
ent subgroups under comparison, corresponding to the different levels or combinations 
of levels of the factor(s) under study, are independent within each subgroup and among 
subgroups. The factor or factors under study are called between- subject factor(s) or 
grouping factor(s). 
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Randomized block designs (RBD). In this case, subjects belonging to the different sub- 
groups under comparison are independent within each subgroup and dependent among 
subgroups. Subgroups correspond to the different levels or combinations of levels of the 
factor(s) under study, which are referred to as conditions in the following discussion. As 
in the case of the paired t test, this situation can derive from (1) one group of subjects, or 
(2) more groups of related subjects (e.g., littermates). In the former case, each subject is 
tested under all different conditions, while in the latter case, within each group, subjects 
are assigned to different conditions, one subject per condition. The factor(s) under study 
is(are) called within-subject factor(s) or repeated measures factor(s). 

Split-plot designs. Split-plot designs are a combination of CRD and RBD. A random 
sample of blocks (e.g., litters), each consisting of more than one subject (e.g., littermates), 
is extracted from a population. The blocks are randomly assigned to one out of two 
or more conditions, treatments or combinations of treatments — i.e., between-subject 
factor(s). The units within each block are randomly assigned to different treatments 
or to combinations of treatments — i.e., within-subject factor(s) — or are evaluated at 
different times — i.e., repeated measures. Between-subject, within-subject, and repeated 
measure factors are usually fixed-effect factors, whereas blocks and units are random- 
effect factors. 



Nonparametric tests 

This category of tests generally relies on the following assumptions: (1) independence 
of observations within each final subgroup, and (2) continuity of the response variable 
in the sampled population. The latter implies that equal observations (ties) should be 
unlikely. Anyway, even though a variable is continuous in theory, all measurements 
must be made on a discrete scale; thus equal observations can occur more frequently than 
expected. Usually, the first step in nonparametric tests is the transformation of the original 
observations into ranks. Equal observations are commonly transformed into equal ranks, 
corresponding to the mean of the ranks that would have been assigned to the equal 
observations had they been different. In the presence of ties, appropriate corrections for 
the test statistic must be adopted. This holds true for all tests that will be described in the 
following paragraphs. 

Mann-Whitney U test for two independent groups 

This test is the nonparametric counterpart of the Student t test for two independent 
groups. It is designed to test the hypothesis of equality of the underlying distributions. 
Compared to the t test, the U test has an asymptotic efficiency equal to 95.5% when the 
assumptions for the t test are respected. This means that the U test is almost as powerful 
as the t test in detecting differences between two independent groups, and is therefore 
a good alternative to the t test when the latter cannot be performed because of violation 
of the assumptions. Note that the minimum total size required by the U test to detect 
differences between the two groups under comparison is n = 1 (i.e., at least 2 versus 5 
observations) for the one-sided test, or n = 10 (i.e., at least 2 versus 8 observations) for 
the two-sided test. 



Wilcoxon test for two dependent groups 

This test is the most famous nonparametric counterpart of the Student t test for two 
dependent groups. It is based on the differences between paired observations, which are 
transformed into ranks for successive computations. The test assumes the continuity of 
the distribution of such differences. This implies that differences equal to 0 are unlikely. 
Unfortunately, as measured observations are necessarily discrete, differences equal to 0 
are more frequent than expected. Usually, such differences are dropped (this is the option 
adopted in most statistical software applications), reducing the effective sample size. 
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Note that the minimum effective sample size required by the Wilcoxon matched-pair 
test to detect differences between the paired groups under comparison is n = 5 for the 
one-sided test, or n = 6 for the two-sided test. 

Kruskal-Wallis analysis of variance for independent groups 

This is the nonparametric countepart of the parametric ANOVA for CRD (see above). 
It is designed to test the hypothesis of equality of the underlying distributions among 
the groups based on the different levels of one between-subject factor. Compared to 
parametric ANOVA, the Kruskal-Wallis ANOVA has an asymptotic efficiency equal to 
95.5%, when the assumptions for the parametric ANOVA are respected. If the response 
variable follows a non-normal distribution, the efficiency of Kruskal-Wallis ANOVA is 
always greater than or equal to 86% (and it can exceed 100%, e.g., in case of uniform or 
exponential distributions). Kruskal-Wallis ANOVA is not too sensitive to differences in 
spread and shape of data distribution (thus it is not too affected by violation of normality 
and homogeneity of variance assumptions), while it is most sensitive to differences in 
centers (such as means and medians, or other indexes of location). 

When the experimental design includes more than one factor (e.g., two factors, sex and 
treatment, with two levels each, four final subgroups), the main effects of the different 
factors and their interaction can be assessed by using / 2 partitioning. This method can 
give the exact partitioning of the overall Kruskal-Wallis x 2 only in the case of balanced 
designs, i.e., when sample sizes are equal in all final subgroups. Unfortunately, this 
method is not available in the most common statistical software applications. 

Friedman analysis of variance for dependent groups 

This is the nonparametric countepart of the parametric ANOVA for RBD. It allows one 
to test the hypothesis of equality of the underlying distributions among the groups, based 
on the different levels of one within-subject factor, against the alternative hypothesis of 
difference in centers (such as means and medians, or other indexes of location). Compared 
to parametric ANOVA, when the assumptions required by this method are respected, the 
Friedman ANOVA has an asymptotic relative efficiency E = 3K/[jt(K +1)], where K 
is the number of groups under comparison. As it can be noted, the asymptotic relative 
efficiency increases as K increases, ranging from the minimum of 63.7% (when K = 2) 
to 95.5% (when A' is exceptionally large). 

When the experimental design includes more than one factor (e.g., two factors, day 
and trial, with 5 and 10 levels respectively, 50 final subgroups), the main effects of 
the different factors and their interaction can be assessed by using the x 2 partitioning. 
This method gives the exact partitioning of the overall Friedman / 2 , when the design 
is balanced, i.e., when sample sizes are equal in all final subgroups (this is the usual 
situation, since unbalancing can derive only from missing values). Unfortunately, the x 2 
partitioning is not available in the most common statistical software applications. 

Nonparametric ANOVA for split-plot designs 

Data from split-plot designs cannot be analyzed using nonparametric ANOVA, unless 
each within-subject or repeated measures factor has only two levels. In this situation, it 
is possible to assess the significance of the interaction of between- subject factor(s) with 
within-subject factor(s) by performing the Kruskal-Wallis ANOVA, with the between- 
subject factor as the grouping variable, on the difference between the two levels of the 
within-subject factor. Unfortunately, this analysis can not be directly performed using 
the most common statistical software applications. 
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PRESENTATION OF RESULTS 

Tables and figures can be used to present the results of statistical analyses. In particular, 
to synthesize qualitative data, use absolute and percent frequencies in the different 
categories of each variable. On the contrary, for quantitative data, use the appropriate 
indexes of location {mean) and variability {variance, standard deviation, standard error). 
When the distribution of data is markedly asymmetrical (presence of outliers or of cutoff 
values), it is preferable to use median and range, interquartile range, or median absolute 
deviation (Armitage et al, 2002). 

In addition, when a logistic regression has been performed, present odds ratios (ORs) 
with the corresponding confidence interval (CI). Commonly, the 95% CI is used (Hosmer 
and Lemeshow, 2000). 

Similarly, for Kaplan-Meier survival analysis, present the cumulative incidence (that is, 
the incidence of subjects achieving the response) at a given time since the beginning of 
observation, with the corresponding 95% CI. 

Finally, remember to report the test statistics with pertinent degrees of freedom and the 
exact significance levels. Reporting significances as p <0.05, or p = ns is absolutely 
discouraged, except for multiple comparison tests. 

In preparing tables, follow a few main rules. 

1. Subdivide complex tables into two or more simpler tables. 

2. Put all necessary information in the title and in legend of the table (e.g., measure- 
ment units for the variables, explanation of symbols, marginal and overall totals, 
denominators for computation of percentages). 

3. Clearly state the source of data (if not original). 

Follow the same rules in preparing figures. In addition, pay attention to the maximum 
number of different curves and symbols that can be managed in one figure. It has to be 
noted that, when reporting results of statistical analyses in figures, the significance of the 
comparison of two groups can be presented using asterisks, the number or type of which 
depends on the significance level, e.g, * = 0.05, ** = 0.01 and *** = 0.001. Different 
types of graphics can be used to present data, depending on the statistical method used for 
data analysis (Table 13.8.1). The following discussion briefly describes bar charts, box 
and whisker plots, histograms, Kaplan-Meier cumulative incidence curves, line diagrams, 
and PC A graphs (Jackson, 1991; Armitage, 2002). 

Bar chart 

Similar to the histogram (and frequently mistaken for it), this graphical method is very 
useful in the case of quantitative variables with normal or symmetrical distribution. It 
allows one to show the distribution of one quantitative variable for different levels of 
other variable(s) — i.e., grouping factor(s). The height of each rectangle is equal to the 
mean of the variable in the corresponding group, while the upper whisker is equal to the 
standard deviation (or the standard error) in the same group. For an example, see Figure 
13.8.1. 

Box and whisker plot 

This presentation is very useful in case of quantitative variables with non-normal or 
asymmetrical distribution. It makes it possible to synthesize one quantitative variable 
(outcome) for different levels of other variable(s), i.e., grouping factor(s), by present- 
ing some measures of location and variability. Boxes cover the interquartile range and are 
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Table 13.8.1 Graphical Presentations Appropriate for the Different Statistical Approaches 



Statistical approach 


Repeated measures? 


Graphical presentation 


Parametric test for comparison among groups 


Without repeated measures 


Bar chart 

Box and whisker plot 




With repeated measures 


Line graph 


Nonparametric test for comparison among groups 


Without repeated measures 


Box and whisker plot 




With repeated measures 


Line graph 


Kaplan-Meier survival analysis 


— 


Cumulative incidence curve 
Survival curve 


Cox proportional hazards model 




Hazard with 95% CP 


Principal component analysis 




PCA graph 


Logistic regression 




Odds ratios with 95% CP 



"CI, confidence interval. 
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Figure 13.8.1 (A) Example of a bar chart. Results of parametric analysis of variance performed 
on litters' weight for the comparison between hypoxic (hypo) and control (ctrl) dams, every 2 days 
from postnatal day (pnd) 1 to pnd 13 (7 recordings). The mean litter weight is reported, with vertical 
bars representing standard error of mean. (B) The coefficient of variation (CV; i.e., the standard 
deviation divided by the group mean) has been plotted against PND to show the increase in this 
measure occurring in control litters, compared to those whose mothers had undergone hypoxia 
immediately after birth. Reprinted from Neurotoxicology and Teratology, Vol. 25, Cirulli et al. (2003). 
Long-term effects of acute perinatal asphyxia on rat maternal behavior, p. 575, Copyright 2003, 
with permission from Elsevier. 
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Figure 1 3.8.2 Example of box and whisker plot. Results of nonparametric Kruskal-Wallis analysis 
of variance for the comparison between two groups of CD-1 mice, one exposed to millipede 
aversive odor (FM) and the other to vehicle odor (C). In the figure, a box and whisker plot of 
latency of first wall rearing recorded in mice during the hot-plate test, performed after 15 min of 
exposure to the stimulus object, are reported. The line in the middle of the box represents the 
median. Box covers interquartile range (IQ); whiskers extend to upper and lower adjacent values. 
Upper value is defined as largest data point less than or equal to x [75] + (1.5 x IQ). Lower value 
is defined as smallest data point greater than or equal to x [25 ] - (1 .5 x IQ). Outliers (points more 
extreme than the adjacent values) are represented by the letter "o." The symbol §§ represents 
p < 0.001. Reprinted from Brain Research Bulletin, Vol. 58(2), Capone et al. (2002). A new 
easy accessible and low-cost method for screening olfactory sensitivity in mice: Behavioural and 
nociceptive response in male and female CD-1 mice upon exposure to millipede aversive odour, 
p. 201, Copyright 2002, with permission from Elsevier. 



§§ 



o 



divided by the median; whiskers extend to the range or some other measure of dispersion 
(e.g., percentile range). Observations that fall out of the whiskers represent outliers. The 
presence of more outliers in one direction than in the other implies asymmetry in the 
distribution of data. For an example, see Figure 13.8.2. 

Histogram 

A histogram allows one to show the frequency distribution in the different classes of a 
categorized quantitative variable or of a qualitative variable. Rectangles whose areas are 
proportional to the class frequencies are drawn on portions of the x axis, the width of each 
portion representing the class interval of the variable. In case of qualitative variables, the 
height of each rectangle is proportional to the frequency in the corresponding class. 



Kaplan-Meier cumulative incidence curve 

This graphical method allows one to represent the increase over time of the cumulative 
incidence of subjects showing the response. When the survival analysis is performed on 
behavioral developmental data, the curve represents the cumulative proportion of subjects 
achieving the developmental stage chosen as response. More cumulative incidence curves 
(one per treatment group) can be represented in one figure, using different symbols 
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Figure 13.8.3 (continues on next page) Example of Kaplan-Meier cumulative incidence curve. 
Results of the Kaplan-Meier survival analysis performed on the swift righting score collected at 
postnatal days 2 to 1 1 and 1 3 to 1 5 on male and female mice, born from primiparous or multiparous 
dams, undergoing three different treatments (1 , 2, and 3). (A) Female mice born from primiparous 
dams; (B) female mice born from multiparous dams; (C) male mice born from primiparous dams; 
(D) male mice born from multiparous dams. The swift righting behavior was considered as adult- 
like at score 3; thus that score was considered as death. The first day when the subject attained 
a score equal to 3 was considered as time to death. Subjects that did not reach that score were 
considered as censored, and were assigned a time equal to the span of observation (follow-up 
time - 15 days). Sex was considered as stratifying variable, while combinations of parity and 
treatment were considered as grouping variable (6 levels). A significant difference was observed 
among the six groups within each stratum (p<0.0005 for all tests for the comparison of survival 
curves). F. Chiarotti and D. Santucci (unpub. observ.). 
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Figure 13.8.3 (continued) 



to distinguish among different treatment groups. Sometimes the survival curves are 
presented, showing the decrease over time of the survival of subjects free from event. 
For an example, see Figure 13.8.3. 



Line diagram 

This technique allows one to represent data that have been collected over a period of 
time or over increasing doses of a drug. Mean values of the response variable are plotted 
as dots, and these are connected by lines showing the trend (increase, decrease or no 
change) over time or over doses. More curves can be presented in the same figure, one 
per subgroup, based on one or more between- subject factor(s). This helps to visualize 
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Figure 13.8.4 Example of line diagram. Results of parametric analysis of variance performed on the duration of bar 
holding behavior for the comparison between two groups of CD-1 mice, exposed either to millipede aversive odor (FM) or 
to vehicle odor (C). For each mouse, the behavior was recorded for 5 consecutive days, 15 min (3 blocks of 5 min each) 
per day. The behavior displayed near to the stimulus object was distinguished from the behavior displayed far from the 
stimulus object. In the figure, mean durations of the nonavoidance response of bar holding throghout the 5 days of testing 
are reported. Vertical bar on the right-hand side of the figure indicates pooled standard error of mean. This represents the 
best estimator of the standard error, based on more degrees of freedom, under the hypothesis of homogeneity of variance. 
It is computed as the square root of the ratio between the error term in the ANOVA (i.e., mean square of the residual 
appropriate for the assessment of the effect represented in the figure) and the number of observations contributing to 
each mean reported in the figure. Reprinted from Brain Research Bulletin, Vol. 58(2), Capone et al. (2002). A new easy 
accessible and low-cost method for screening olfactory sensitivity in mice: Behavioural and nociceptive response in male 
and female CD-1 mice upon exposure to millipede aversive odour, p. 197, Copyright 2002, with permission from Elsevier. 
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Figure 13.8.5 Example of PCA graph. Results of principal component analysis performed on 
maternal behavior of hypoxic and control dams, observed in the home cage. Factorial axis 4. 
Mean (SEM) coordinates of individuals as a function of postnatal treatment. Ctrl = controls, n = 
6; hypo = hypoxic, n = 7. Perinatal hypoxia affected animal's spread across the axis, increasing 
the probability of finding hypoxic subjects on the positive side (p = 0.0169). Reprinted from 
Neurotoxicology and Teratology, Vol. 25, Cirulli et al. (2003). Long-term effects of acute perinatal 
asphyxia on rat maternal behavior, p. 574, Copyright 2003, with permission from Elsevier. 
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the effect of the between-subject factor(s) on the trend over time or over doses. For an 
example, see Figure 13.8.4. 

PCA graph 

For the presentation of PCA results, monodimensional (x diagram) or bidimensional (xy 
diagram) graphs can be used. The graphical representation must be limited to those PCA 
factors that can be easily interpreted in terms of the original variables and allow a sig- 
nificant discrimination among the experimental groups. Each PCA factor is represented 
on one axis, ranging from —1 to +1. The original variables inversely correlated with 
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a PCA factor are reported near the negative pole of the axis representing that factor, 
while variables directly correlated to the factor appear near the positive pole. Along the 
axis, the mean values (dots) and the standard deviations (bars) relative to the subgroups 
differing for that PCA factor must be reported. For an example, see Figure 13.8.5. 
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Table 13.8.2 Aim and Characteristics of Statistical Methods for the Analysis of Behavioral Test Data 



Description of statistical methods 



Behavioral 


Aim 


Univariate/ 


Response variable 


Pros 


Cons 


Statistical method for 


Degree of 


test 




multivariate 








comparison 


complexity 


Fox battery 


Analysis of a 


• 

Univariate 


First day of adult-like 


Allows detection of 


Does not take into 


Test for comparison 


Low 




synthetical measure of 




response 


anticipations or delays of account the whole 


among groups 






the score nroflle 






the full response 


curve 












Area under the curve 


Takes into account the 
whole curve 


Does not allow one to 
detect anticipations or 
delays of the full 
response 


Test for comparison 
among groups 


Low 




Analysis of the whole 


Univariate 


Adult-like response (yes 


Takes into account the 


Inappropriate for 


Survival analysis: 


Medium/ 




score prome 




VS. IlOJ dllU nine LO 

adult-like response 


whole curve 


correlated data (e.g., 
littermates assigned to 
different treatment 
groups) 


ivapidii-ivieier-v-OA 
proportional hazards 
model 


high 


Ultrasound 


Evaluation of calls as 


Univariate 


Number of calls 






Test for the comparison 


Low 


vocalizations 


for number, category, 
and frequency band 










among groups 




Passive 


Acquisition phase: 


Univariate 


Number of trials to reach 


Allows detection of 


Does not take into 


Test for comparison 


Low 


avoidance 


analysis of the number 
of trials to criterion 




the criterion 


anticipations or delays in 
the attainment of the 
criterion 


account the velocity in 
stepping through in the 
repeated trials 


among groups 










Attainment of the 


Allows detection of 


Does not take into 


Survival analysis: 


Medium/ 








criterion (yes vs. no) and 


anticipations or delays in 


account the velocity in 


Kaplan-Meier Cox 


high 








number of trials to reach 


the attainment of the 


stepping through in the 


proportional hazards 










the criterion 


criterion and directly 
deals with subjects not 
attaining the criterion 


repeated trials 


model 
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Table 13.8.2 Aim and Characteristics of Statistical Methods for the Analysis of Behavioral Test Data, continued 



Description of statistical methods 



Behavioral 
test 



Aim 



Univariate/ 
multivariate 



Response variable 



Pros 



Cons 



Statistical method for 
comparison 



Degree of 
complexity 



Passive 

avoidance 

(continued) 



Morris water 
maze 



Acquisition phase: 
analysis of the latency 
profile to step-through 



Univariate Median latency value 



Retest trial: analysis of 
the latency to 
step-through 

Acquisition phase: 
analysis of the latency 
profile to reach the 
platform 



Acquisition phase: 
analysis of the mean 
velocity 



Takes into account the 
velocity in stepping 
through and is not 
affected by outliers 

Area under latency curve Takes into account the 
velocity in stepping 
through and takes into 
account the whole curve 



Not very sensitive to 
anticipations or delays 
in the attainment of the 
criterion 

Not very sensitive to 
anticipations or delays 
in the attainment of the 
criterion 



Univariate Latency to step-through 



Univariate Median latency value 



Area under latency curve 



Univariate Mean velocity profile in 
repeated days and trials 



Takes into account the 
velocity in reaching the 
platform and is not 
affected by outliers 

Takes into account the 
velocity in reaching the 
platform and takes into 
account the whole curve criterion 

Takes into account the 
whole profile 



Not very sensitive to 
anticipations or delays 
in the attainment of the 
criterion 

Not very sensitive to 
anticipations or delays 
in the attainment of the 



Test for comparison 
among groups 



Test for comparison 
among groups 



Test for comparison 
among groups 

Test for comparison 
among groups 



Test for comparison 
among groups 



Test for comparison 
among groups 



Low 



Low 



Low 



Low 



Low 



Low 
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Table 13.8.2 Aim and Characteristics of Statistical Methods for the Analysis of Behavioral Test Data, continued 



Description of statistical methods 



Behavioral 
test 



Aim 



Univariate/ 
multivariate 



Response variable 



Pros 



Cons 



Statistical method for 
comparison 



Degree of 
complexity 



Morris water 
maze 

(continued) 



Spatial open 
field 



Acquisition phase: 
analysis of the path 
length 

Acquisition phase: 
analysis of the time 
spent in the peripheral 
annular area close to 
the wall 

Retest trial: analysis of 
locomotor activity 

Retest trial: analysis of 
total time 

Analysis of behavioral 
pattern 



Analysis of general 
behaviors 



Univariate Path length profile in 

repeated days and trials 

Univariate Profile of the time spent 
in the peripheral annular 
area close to the wall on 
repeated days and trials 

Univariate Crossings of acquisition 
quadrant 

Univariate Total time spent in the 
acquisition quadrant 

Univariate Frequency (or duration) 
of each: general behavior 
(e.g., rearing, wall 
rearing, grooming, 
locomotor activity) and 
behavior specifically 
directed towards the 
object (e.g., object 
contacts) 

Mutivariate Frequency (or duration) 
of all general behaviors, 
for each subject in each 
repeated session 



Takes into account the 
whole profile 

Takes into account the 
whole profile 



Allows one to assess the 
specific effect of 
treatment on each 
observed behavior 



Takes into account the 
correlation among the 
behavioral categories 



Affected by animal's 
(hyper)activity 

Is affected by animal's 
boredom 

Does not take into 
account the correlation 
among the behavioral 
categories 



May result in factors 
that are difficult to 
interpret 



Test for comparison 
among groups 

Test for comparison 
among groups 



Test for comparison 
among groups 

Test for comparison 
among groups 

Test for comparison 
among groups 



Low 



Low 



Low 
Low 
Low 



Principal component 
analysis 



Medium/ 
high 



Table 13.8.2 Aim and Characteristics of Statistical Methods for the Analysis of Behavioral Test Data, continued 



Description of statistical methods 



Behavioral 
test 



Aim 



Univariate/ 
multivariate 



Response variable 



Pros 



Cons 



Statistical method for 
comparison 



Degree of 
complexity 



Maternal Analysis of maternal 

behavior behavior in the home 

cage 



Analysis of maternal 
behavior in a novel 
cage 



Univariate 0/1 scores, for each 

behavior, time point, and 
day of observation 

Univariate Number of instantaneous 
samples in which the 
behavior has been 
displayed, for each time 
point and day of 
observation 

Multivariate Number of instantaneous 
samples in which the 
behavior has been 
displayed, for each 
subject at each time point 
and day of observation 

Univariate Latency to perform the 
behavior 

Univariate Performance of the 

behavior (yes vs. no) and 
latency to perform the 
behavior 



Takes into account if the 
behavior has been 
displayed or not (good 
for very rare behaviors) 

Allows a gross 
quantification of the 
displayment of the 
behaviors 



Takes into account the 
correlation among the 
behavioral categories 



Takes into account the 
competition between 
mutually exclusive 
responses 



Does not quantify 
display of the behavior 



Not appropriate for 
very rare behaviors 



May result in factors 
that are difficult to 
interpret 



Inappropriate for 
correlated data (e.g., 
littermates assigned to 
different treatment 
groups) 



Logistic regression 



Test for comparison 
among groups 



Principal component 
analysis 



Test for comparison 
among groups 

Survival analysis: 
Kaplan-Meier 



Low/ 
medium 



Low 



Medium/ 
high 



Low 



Medium/ 
high 



